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Abstract 

Neuroevolution has yet to scale up to complex reinforcement learning tasks that re- 
quire large networks. Networks with many inputs (e.g. raw video) imply a very high 
dimensional search space if encoded directly. Indirect methods use a more compact 
genotype representation that is transformed into networks of potentially arbitrary 
size. In this paper, we present an indirect method where networks are encoded by 
a set of Fourier coefficients which are transformed into network weight matrices via 
an inverse Fourier-type transform. Because there often exist network solutions whose 
weight matrices contain regularity (i.e. adjacent weights are correlated), the number 
of coefficients required to represent these networks in the frequency domain is much 
smaller than the number of weights (in the same way that natural images can be 
compressed by ignore high-frequency components). This "compressed" encoding is 
compared to the direct approach where search is conducted in the weight space on 
the high-dimensional octopus arm task. The results show that representing networks 
in the frequency domain can reduce the search-space dimensionality by as much as 
two orders of magnitude, both accelerating convergence and yielding more general 
solutions. 

1 Introduction 

Training neural networks for reinforcement learning tasks (i.e. as value-function approxi- 
mators) is problematic because the non-stationarity of the error gradient can lead to poor 
convergence, especially if the networks are recurrent. The data which the agent learns 
from is dependent on the agent's own policy which changes over time. 

An alternative to training by gradient-descent is to search the space of neural net- 
works policy directly via evolutionary computation. In this neuroevolutionary framework, 
networks are encoded either directly or indirectly as strings of values or genes, called chro- 
mosomes, and then evolved in the standard way (genetic algorithm, evolution strategies, 
etc.) 

Direct encoding schemes employ a one-to-one mapping from genes to network parame- 
ters (e.g. connectivity pattern, synaptic weights), so that the size of the evolved networks 
is proportional to the length of the chromosomes. 

In indirect schemes, the mapping from chromosome to network can in principle be 
any computable function, allowing chromosomes of fixed size to represent networks of 
arbitrary complexity. The underlying motivation for this approach is to scale neuroevo- 
lution to problems requiring large networks such as vision (Gauci and Stanley, 2007), 
since search can be conducted in relatively low-dimensional gene space. Theoretically, the 



optimal or most compressed encoding is the one in which each possible network is repre- 
sented by the shortest program that generates it, i.e. the one with the lowest Kolmogorov 
complexity (Li and Vitanyi, 1997). While the lowest Kolmogorov complexity encoding is 
generally not computable, but it can be approximated from above through a search in the 
space of network-computing programs (Schmidhuber, 1995, 1997) written in a universal 
programming language. 

Less general but more practical encodings (Gauci and Stanley, 2007; Gruau, 1994; 
Buk et al., 2009; Buk, 2009) often lack continuity in the genotype-phenotype mapping^ 
such that small changes to a genotype can cause large changes in its phenotype. For 
example, using cellular automata (Buk, 2009) or graph-based encodings (Kitano, 1990; 
Gruau, 1994) to generate connection patterns can produce large networks but violates 
this continuity condition. HyperNEAT (Gauci and Stanley, 2007), which evolves weight- 
generating networks using Neuro-Evolution of Augmenting Topologies (NEAT; Stanley 
and Miikkulainen 2002) provides continuity while changing weights, but adding a node 
or a connection to the weight-generating network causes a discontinuity in the phenotype 
space. These discontinuities occur frequently when e.g. replacing NEAT in HyperNEAT 
with genetic programming-constructed expressions (Buk et al., 2009). Furthermore, these 
representations do not provide an importance ordering on the constituent genes. For 
example, in the case of graph encodings, one cannot gradually cut of less important parts 
of the graph (GP expression, NEAT network) that constructs the phenotype. 

Here we present an indirect encoding scheme in which genes represent Fourier series 
coefficients, and genomes are decoded into weight matrices via an inverse Fourier-type 
transform. This means that the search is conducted in the frequency domain rather than 
the weight space (i.e. the spatio-temporal domain). Due to the equivalence between the 
two, this encoding is both complete and closed: all valid networks can be represented and 
all representations are valid networks (Kassahun et al., 2007). The encoding also provides 
continuity (small changes to a frequency coefficient cause small changes to the weight 
matrix), allows the complexity of the weight matrix to be controlled by the number of 
coefficients (importance ordering), and makes the size of the genome independent of the 
size of the network it generates. 

The intuition behind this approach is that because real world tasks tend to exhibit 
strong regularity, the weights near each other in the weight matrix of a successful network 
will be correlated, and therefore can be represented in the frequency domain by relatively 
few, low-frequency coefficients. For example, if the input to a network is raw video, it is 
very likely the input weights corresponding to adjacent pixels will have a similar value. 
This is the same concept used in lossy image coding where high-frequency coefficients 
containing very little information are discarded to achieve compression. 

This "compressed" encoding was first introduced by Koutnik et al. (2010) where a 
version of practical universal search (Schaul and Schmidhuber, 2010) was used to discover 
minimal solutions to well-known RL benchmarks. Subsequently (Koutnik et al., 2010) 
it was used with the CoSyNE (Gomez et al., 2008) neuroevolution algorithm where the 
correlation between weights was restricted to a 2D topology. In this paper, the encod- 
ing is generalized to higher dimensional correlations that can potentially better capture 
the inherent regularities in a given environment, so that fewer coefficients are needed to 
represent successful networks (i.e. higher compression). The encoding is applied to the 
scalable octopus arm using a variant of Natural Evolution Strategies (NES; Wierstra et al. 
2008), called Separable NES (SNES; Schaul et al. 2011) which is efficient for optimizing 
high-dimensional problems. Our experiments show that while the task requires networks 
with thousands of weights, it contains a high degree of redundancy that the frequency 
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domain encoding can exploit to reduce the dimensionality of the search dramatically. 

The next section provides a short tutorial on the Fourier transform. Section 3 describes 
the DCT network encoding and the procedure for decoding the networks. The experimen- 
tal results appear in section 4, where we show how the compressed network representation 
both can accelerate learning and provide more robust solutions. Section 5 discusses the 
main contributions of the paper, and provide some ideas for future research. 



2 The Fourier Transform 

Any periodic function f{t) can be uniquely represented by an infinite sum of cosine and 
sine functions, i.e. its Fourier series: 

oo 

f{t) = Co + ^ [a^j cos{ujt) + sin(cjt)] , (1) 

UJ = 1 

where t is time and ou is the frequency, and cq = ao/2. The coefficients a^j and b^j specify 
how much of the corresponding function is in /(t), and can be obtained by multiplying 
both sides of eq. (1) by the band frequency, integrating, and dividing by tt. So for the 
coefficient, ao^ of the cosine with frequency 9: 

- / f{t) cos{0t) dt = - cos{0t)co + cos{0t) ^ [a^ cos{ujt) + bo sin(a;i)] dt (2) 
= — [ cos{et) cos{iot) dt, e = u (3) 

TT J-n 

/TT 
cosset) dt (4) 
-TT 

= ae (5) 

(2) simplifies to (3) because all sinusoidal functions with different frequencies are orthogo- 
nal and therefore cancel out, J^^cos(ct;t) sm(9t) dt = 0, V0 ^ cj, leaving only the frequency 
of interest, and (4) simplifies to (5) because J^^cos^tdt = J^^sin^tdt = tt. 
The Fourier series can be extended to complex coefficients: 

/(t)= 5] a^e-*, a, = - / /(t)e--*di. (6) 

OO —TT 

For a function periodic in [— L/2,L/2] the equations become: 

f{t)= V a^e^^--*/^, a^ = - /(t)e-"/^dt. (7) 

The Fourier transform is then a generalization of complex Fourier series as L ^ oo. The 
discrete is replaced with a continuous F{k)^ while uo/L ^ k and the sum is replaced 
with an integral: 

/oo roo 
F{ky'^^^^dk, F{k)= / /(t)e-^2^'=*dfc, (8) 
-oo J —OO 

In the case where there are N uniformly-spaced samples of /(t), the discrete Fourier 
transform (DFT) 

N-l 

Ck=Yl ^ne-^('"/^)^", = 0, . . . , TV - 1 (9) 

n=0 
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Figure 1: Decoding the compressed networks. The figure shows the three step 
process involved in transforming a genome of frequency-domain coefficients into a recurrent 
neural network. First, the genome (left) is divided into k chromosomes, one for each of the 
weight matrices specified by the network architecture, Each chromosome is mapped, by 
Algorithm 1, into a coefficient array of a dimensionality specified by fi. In this example, an 
RNN with two inputs and four neurons is encoded as 8 coefficients. There are A: = = 3, 
chromosomes and Q = {3, 3, 2}. The second step is to apply the inverse DCT to each array 
to generate the weight values, which are mapped into the weight matrices in the last step. 



and the inverse discrete Fourier transform 

1 ^"^ 

= E n = 0, . . . , iV - 1 (10) 

k=0 

are defined. 

The most widely used transform in image compression, is the discrete cosine transform 
(DCT) which considers only the real part of the DFT. The DCT is an invertible function 
/ : ^ that computes a sequence of coefficients (cq . . . cn-i) from a sequence of 
real numbers (xq . . .xn-i)- There are four types of DCT transforms based on how the 
boundary conditions are handled. In this paper, the Type III DCT, DCT(III), is used 
to transform coefficients into weight matrices. DCT(III) is the inverse of the standard, 
forward DCT(II) used in e.g. JPEG, and is defined as: 

fc = 0,...,iV-l (11) 

where Wk is the fc-th weight and Cn is the n-th frequency coefficient. 

The DCT can be performed on signals of arbitrary dimension by applying a one- 
dimensional transform along each dimension of the signal. For example, in a 2D image a 
ID transform is first applied to the columns and then, a second ID transform is applied 
to the rows of the coefficient matrix resulting from the first transform. 

When a signal, such as a natural image, is transformed into the frequency domain, the 
power in the upper frequencies tends be low (i.e. the corresponding coefficients have small 
values) since pixel values tend change gradually across most of the image. Compression 
can be achieved by discarding these coefficients, meaning fewer bits need to be stored, 
and replacing them with zeros during decompression. This is the idea behind the network 
encoding described in the next section: if a problem can be solved by a neural network with 
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smooth weight matrices, then, in the frequency domain, the matrices can be represented 
using only some of the frequencies, and therefore fewer parameters compared to the number 
of weights in the network. 

3 DCT Network Representation 



Networks are encoded as a string or 
genome^ g = {^i, . . . , p/^}, consisting of k 
substrings or chromosomes of real numbers 
representing DCT coefficients. The num- 
ber of chromosomes is determined by the 
choice of network architecture, and data 
structures used to decode the genome, spec- 
ified by O = {i^i, . . . , Dk}^ where D^, m = 
is the dimensionality of the coefficient 
array for chromosome m. The total num- 
ber of coefficients, C = J2m=i l^^l ^ 
is user-specified (for a compression ratio of 
N/C)^ and the coefficients are distributed 
evenly over the chromosomes. Which fre- 
quencies should be included in the encod- 
ing is unknown. The approach taken here 
restricts the search space to hand-limited 
neural networks where the power spectrum 
of the weight matrices goes to zero above 
a specified limit frequency, c^, and chro- 
mosomes contain all frequencies up to c^, 

9m — (^0^5 • • • ? ^T)' 

Figure 1 illustrates the procedure used 
to decode the genomes. In this example, 
a fully-recurrent neural network (on the 
right) is represented by = 3 weight ma- 
trices, one for the input layer weights, one 
for the recurrent weights, and one for the bias weights. The weights in each matrix are 
generated from a different chromosome which is mapped into its own D^-dimensional 
array with the same number of elements as its corresponding weight matrix; in the case 
shown, {3, 3, 2}: 3D arrays for both the input and recurrent matrices, and a 2D array 
for the bias weights. 

In previous work (Koutnik et al., 2010), the coefficient matrices were 2D, where the 
simplexes are just the secondary diagonals; starting in the top-left corner, each diagonal 
is filled alternately starting from its corners (see figure 2). However, if the task exhibits 
inherent structure that cannot be captured by low frequencies in a 2D layout, more com- 
pression can potentially be gained by organizing the coefficients in higher-dimensional 
arrays. 

Each chromosome is mapped to its coefficient array according to Algorithm 1 (figure 3) 
which takes a list of array dimension sizes, d — {d\^ • • • ? ^Drn) the chromosome, Qrw) 
to create a total ordering on the array elements, e^i,...,^^^- In the first loop, the array is 
partitioned into (D^ — l)-simplexes, where each simplex, s^, contains only those elements 




Figure 2: Coefficient importance. The 

coefficients are ordered along the second di- 
agonals in the two-dimensional case depicted 
here (left). Each diagonal is filled from the 
edges to the center starting on the side that 
corresponds to the longer dimension. The 
complexity of the weight matrix (right) is 
controlled by the number of coefficients. The 
gray-scale levels denote the weight values 
(black = low, white = high). The more co- 
efficients that are used the more potentially 
complex the weight matrix. 
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Algorithm 1: Coefficient mapping(^, d) 



P2 




j ^0 

K ^ sort(diag(o?) - I) 
for z = to Ml - 1 + Enii dn do 



for z = to \ind\ do 
if i < \g\ then 

I coeff .array [mc/[i]] ^ Ci 
else 

I coeff_array[mc?[i]] 
end 



end 




/ ^ 



end 



Si ^ Si \ ind[j++\ 



end 



Figure 3: Mapping the coefficients: The cuboidal array (left) is filled with the coef- 
ficients from chromosome g one simplex at a time, according to Algorithm 1, starting at 
the origin and moving to the opposite corner one simplex at a time. 

e whose Cartesian coordinates, (^i, . . . ^iDm)^ integer i. The elements of simplex 

5^ are ordered in the while loop according to their distance to the corner points, pi (i.e. 
those points having exactly one non-zero coordinate; see example points for a 3D-array in 
figure 3), which form the rows of matrix K — [pi^ . . . ,Pm]"^5 sorted in descending order by 
their sole, non-zero dimension size. In each loop iteration, the coordinates of the element 
with the smallest Euclidean distance to the selected corner is appended to the list ind^ 
and removed from Si. The loop terminates when si is empty. 

After all of the simplexes have been traversed, the vector ind holds the ordered element 
coordinates. In the final loop, the array is filled with the coefficients from low to high 
frequency to the positions indicated by ind] the remaining positions are filled with zeroes. 
Finally, a D^n— dimensional inverse DCT transform is applied to the array to generate the 
weight values, which are mapped to their position in the corresponding 2D weight matrix. 
Once the k chromosomes have been transformed, the network is complete. 

The DCT network representation is not restricted to a specific class of networks but 
most of the conventional perceptron-type neural networks can be represented as a special 
case of a fully- connected recurrent neural networks (FRNN). This architecture is general 
enough to represent e.g. feed-forward and Jordan/Elman networks since they are just 
sub-graphs of the FRNN. 

4 Experiments 

The compressed weight space encoding was tested on evolving neural network controllers 
for the octopus arm problem, introduced by Yekutieli et al. (2005)^. The octopus arm was 
chosen because its complexity can scaled by increasing the arm length. 



^This task has been used in past reinforcement learning competitions, http://rl-coinpetition.org 



6 




Figure 4: Fully-connected recurrent neural network representation. A single- 
chromosome genome, (5.0, —3.3, 4.1, —9.7, —2.2), is shown decoded into three different 
networks. The genome is first mapped into annx(n + z + l) matrix which is transformed 
into a weight matrix via the 2D inverse DCT. The right column shows the resulting 
networks corresponding to each matrix. Note that the size of the network is independent of 
the genome length. The squares denote input units; the circles are neurons, arrow thickness 
denotes the magnitude of a connection weight and its color the polarity (black=positive, 
red=negative). 

4.1 Octopus- Arm Task 

The octopus arm (see figure 6) consists of p compartments floating in a 2D water environ- 
ment. Each compartment has a constant volume and contains three controllable muscles 
(dorsal, transverse and ventral). The state of a compartment is described by the x^y- 
coordinates of two of its corners plus their corresponding x and y velocities. Together 
with the arm base rotation, the arm has 8p + 2 state variables and 3p + 2 control vari- 
ables. The goal of the task to reach a goal position with the tip of the arm, starting from 
three different initial positions, by contracting the appropriate muscles at each Is step of 
simulated time. While initial positions 2 and 3 look symmetrical, they are actually quite 
different due to gravity. 

The number of control variables is typically reduced by aggregating them into 8 
"meta"-actions: contraction of all dorsal, all transverse, and all ventral muscles in first 
(actions 1, 2, 3) or second half of the arm (actions 4, 5, 6) plus rotation of the base in 
either direction (actions 7, 8). In the experiments, both meta-actions and raw actions are 
used. 

4.2 Neural Network Architectures 

Networks were evolved to control a n = 10 compartment arm using two different fully- 
connected recurrent neural network architectures: ^^i, with 8 neurons controlling the 
meta-actions, and ^^2, having 32 neurons, one for each primitive, non- aggregated (raw) 
action (see figure 5). Architecture has 8 x 82 input weight matrix, 8x8 recurrent 
weight matrix and bias vector of length 8, for a total of 728 weights. Architecture has 
32 X 82 input weight matrix, 32 x 32 recurrent weight matrix and bias vector of length 32, 
for a total of 3680 weights. 
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32x32 recurrent connections 



32 nodes 32 outputs 

(11x3 grid, 1 not used) 



^^^^^^^^^^^ 




base rotation + 1 compartments 



base rotation + 1 compartments 

^2 



Figure 5: Neural Network Architectures. Architecture consists of 8 fully-connected 
neurons that control the arm through the met a- act ions. The network is connected to 8n+2 
inputs [n stands for the number of compartments, e.g. 10). The network for raw actions 
has 32 neurons (in the case of 10 compartments) organized in 3 x (n + 1) grid. 



The following three schemes were used to map the genomes in the coefficient arrays, 
see figure 7. 

1. the genome is mapped into a single matrix (i.e. k = 1), the inverse DCT is 
performed, and the matrix is split into a nx (8p+2) matrix of input weights, a n x n 
weight matrix of recurrent connections and a bias vector of length n, where n is the 
number of neurons in the network, and 'p is the number of arm compartments. 

2. VL2'. the genome is partitioned into /c = 3 chromosomes, mapped into three arrays: 
(1) a 3D, nx(^+i)x8 array, where 8 refers to the number of state variables per 
compartment, (2) an nxn array for the recurrent weights of the n neurons controlling 
the meta-actions, and (3) a bias vector of length n. 

3. ils: the genome is partitioned into = 3 chromosomes, mapped into three arrays: 
(1) a 4D 8x(j}+i)x3x(]9+i) array that contains input weights for a 3x(j>+i) grid of 
neurons, one for each raw action, (2) a 3x (p+i) x3x (p+i) recurrent weight array, 
and (3) and a 3x(^+i) bias array. The dimension size of 3 in these arrays refers to 
the number of muscles per compartment. 

Schemes VL\ and were used to generate networks; VL\ and f^s were used to 
generate networks. Coefficient arrays are filled using Algorithm 1, and weights for each 
compartment are placed next to weights for the adjacent compartments in the physical 
arm. 

Scheme VL\ was used by Koutnfk et al. (2010), and is included here for the purpose 
of comparison. This is the simplest mapping that forces a single set of coefficients (chro- 
mosome) to represent all of the network weight matrices. Scheme tries to capture 3D 
correlations between input weights, so that fewer coefficients may be required to represent 
the similarity between not only weights with similar function (i.e. affecting state variables 
near each other on the arm) within a given arm compartment (as in Jli), but also across 
compartments. The input, recurrent and bias weights are compressed separately, ar- 
ranges the weights such that correlations between the all four dimensions that uniquely 
specify a weight a can be exploited. For example, this data structure places next to each 
other input weights affecting: muscles with the same function in adjacent compartments. 
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muscles in the same compartment with different functions, the same muscle from adjacent 
compartments, etc. 





4.3 Setup 

Indirect encoded networks were evolved with a fixed num- 
ber of coefficients C = {10,20,40,80,160,320}, and us- 
ing an incremental procedure describe below, for the four 
configurations ^^if^i, ^^if^2, ^2^^i, and ^^2^^3, where for 
example ^^2^3 denotes the architecture that uses raw ac- 
tions and is decoded using scheme . Each of the 6 
(compression ratios) x 4 (^^Q configurations) = 24 se- 
tups consisted of 20 runs. For comparison direct encoded 
networks were also evolved where the genomes explicitly 
encode the weights, for a total of 728 and 3680 genes 
(weights), for and ^^2, respectively. 

Networks were evolved using Separable Natural Evo- 
lution Strategies (SNES; (Sun et al., 2011)), an efficient 
variant in the NES (Wierstra et al., 2008) family of black- 
box optimization algorithms. In each generation the al- 
gorithm samples a population of A G N individuals, com- 
putes a Monte Carlo estimate of the fitness gradient, 
transforms it to the natural gradient and updates the 
search distribution parameterized by a mean vector, 
and covariance matrix, cr. Adaptation of the full covari- 
ance matrix is costly because it requires computing the 
matrix exponential, which becomes intractable for large 
problems (e.g. more than 1000 parameters - network 
weights or DCT coefficients). SNES combats this by re- 
stricting the class of search distributions to be Gaussian 
with a diagonal covariance matrix, so that the search is 
performed in predefined coordinate system. This restric- 
tion makes SNES scale linearly with the problem dimension (see (Wierstra et al., 2008) 
for a full description of NES). 

The population size A is calculated based on the number of coefficients, C, being 
evolved, A = 4 + [31og(C)J +4, the learning rates are r]^ = r]^ = ^^f^* ^^^^ ^^^^ ^™ 
is limited to 6000 fitness evaluations. 

The fitness was computed as the average of the following score over three trials: 



Figure 6: Octopus arm: a 

fiexible arm consisting of n 
compartments, each with 3 
muscles, must be controlled to 
touch a goal location with the 
arm tip from 5 different ini- 
tial positions. Initial positions 
— 7r/2, and 7r/2 are used for 
training, — 7r/4 and 7r/4 were 
used for generalization tests in 
section 4.5.1. 



max 



t d 
TD 



,0 



(12) 



where t is the number of time steps before the arm touches the goal, T is a number of 
time steps in a trial, d is the final distance of the arm tip to the goal and D is the initial 
distance of the arm tip to the goal. Each of the three trials starts with the arm in a 
different configuration (see figure 6). This fitness measure is different to the one used in 
(Woolley and Stanley, 2010), because minimizing the integrated distance of the arm tip to 
the goal causes greedy behaviors. In the viscous fiuid environment of the octopus arm, a 
greedy strategy using the shortest length trajectory does not lead to the fastest movement: 
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8 meta input recurrent bias 




^3 

Figure 7: Coefficient mappings. The coefficients are mapped into two network archi- 
tectures, ^^1, and (with 8 and 32 neurons, respectively) using three mappings: 
maps coefficients into a single 2D matrix, which splits into 8 x (8^ + 2) input matrix, 
8x8 matrix of recurrent connections and a bias vector. Alternatively, using f^2, the input 
array can be three-dimensional {p + 1 compartments x 8 neurons x 8 state variables) 
to respect the geometrical constrains of the input space. The network that controls raw 
actions is decoded after the coefficients are mapped (with Q3) into two four-dimensional 
arrays, from which input and recurrent weights are decoded (p + 1 x 8 input connected to a 
layer of 3 x ^ + 1 neurons). In the case of and $^3, the coefficient arrays are larger than 
number of weights {p compartments plus 2 state variables for the arm base) and some of 
the coefficients are unused as denoted in the figures. 
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the arm has to be compressed first, and then stretched in the appropriate direction. Our 
fitness function favors behaviors that reach the goal within a small number of time steps. 

In all of the experiments described so far, the encoding stays fixed throughout the evo- 
lutionary run. Therefore it depends on correctly guessing the best number of coefficients. 
In an attempt to automatically determine the best number of coefficients, a set of 20 simu- 
lations were run, using configuration "if 2^3^ where the networks are initially encoded by 10 
coefficients and then the number of coefficients incremented by 10 every 6000 evaluations. 
If the performance does not improve after 6 successive coefficient additions, the algorithm 
ends and the best number of coefficients is reported. Adding a coefficient to the network 
encoding means incrementing the number of dimensions in the mean, /x, and covariance, 
cr, vectors of the SNES search distribution. 

When coefficients are added the complexity of all k weight matrices increases. For 
example, a genome consisting of C = 10 coefficients is distributed into k = 3 chromosomes: 
9i = (cq, c];, C2, C3), g2 = (cq, c^, C2) and gs = (cq, , C2). Additional coefficients would then 
be appended one at a time cycling through the chromosomes, adding the first to ^2 (the 
first shortest chromosome), the second to ^3, the next to ^1, and so on, until all 10 new 
coefficients are added, resulting in chromosomes of length \gi \ = 7, \g2\ = 7, \gs\ =6. If a 
chromosome reaches a length equal to the number of weights in its corresponding weight 
matrix, then it cannot take on any more coefficients, and any additional coefficients are 
distributed the same way over the other chromosomes. 

In most tasks, not all input or control variables can be organized in such way (such 
as the base rotation in the octopus arm task). In such case, one can either use a separate 
weight array, or place the weights together in a large array and decode them separately. 
In such a case, some values that result from the inverse DOT are not used. 

4.4 Results 

Figure 8 summarizes the experimental results. Each of the three log-log plot shows per- 
formance of each encoding for one of the three '^^} configurations; each curve denotes the 
best fitness in each generation (averaged over 20 runs). The bar-graph shows the number 
of evaluations required on average for each set to reach a fitness of 0.75. 

For the ^^lili, controllers encoded indirectly by 40 coefficients or less (C = {10, 20, 40}) 
reach high fitness more quickly than the direct encoded controllers. However, the final 
fitness after 6000 evaluations is similar across all encodings. Because the networks are 
relatively small (728 weights) when meta-actions are used, direct search in weight space 
is still efficient. When architecture is decoded using fi2, surprisingly the advantage 
of the indirect encoding is lost. While the 3D coefficient input array would seem to offer 
higher compression, it turns out that the number of coefficients required properly set the 
weights in this structure is so close to the number of weight in the network that nothing 
is gained. 

For raw action control, ^^2? where the networks now have 3680 weights, the simple 
ill scheme again works well, converging 60% faster while using less than 5% as many 
parameters (C = 160) as the direct encoding. However, much higher compression comes 
from where correlations in all four dimensions of the arm can be captured. The direct 
encoding only outperforms C = 10, which does not offer enough complexity to represent 
successful weight matrices. But, with just 20 DCT coefficients, the compression ratio goes 
to 184:1; reaching a fitness of 0.75 in only 455 evaluations, more than 11 times faster than 
the direct encoding. 

Figure 10 shows examples of weight matrices evolved for the two most successful con- 
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Figure 8: Performance results. The three log-log plots show the best fitness at each 
generation (averaged over 20 runs), for each encoding for a given "^f^ configuration. 



figurations. Notice how regular the weight values are compared to the direct encoded 
networks. The evolved controllers exhibit quite natural looking behavior^. For example, 
when starting with the arm hanging down (initial state 3), the controller employs a whip- 
like motion to stretch the arm tip toward the goal, and overcome gravity and the resistance 
from the fluid environment (figure 14). 

Figure 11 contains box-plots showing the median, maximum and minimum (out of 20 
independent runs) fitness found during the progress of the incremental coefficient evolution. 
With the initial 10 coefficients the runs reach a median fitness of ^ 0.86, but with very 
high variance. As coefficients are added the median improves peaking at C = 30, and the 
variance narrows to a minimum at C = 40. 

4.5 Generalization 

In this section the best controllers from the two most successful indirect encodings, ^^lOi 
and "^^2^3^ are tested in two ways to measure both the generality of the evolved behavior, 
and that of the underlying frequency-based representation. 

4.5.1 Different Starting Positions 

Controllers were re-evaluated on the task using two new starting positions, with the arm 
oriented in the — 7r/4 and 7r/4 directions instead of the three positions (— 7r/2, 0, 7r/2) 
used during evolution (see figure 6). Figure 12 shows the results of this test comparing 
direct and indirect encoded controllers. Each data point is the median fitness of the best 

^go to http://www.idsia.ch/~koutnik/images/octopus.mp4 for a video demonstration 
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Figure 9: Performance results. The bar graph shows the number of evaluations required 
on average to reach a fitness of 0.75 for each set of experiments, ^^if^i converges faster 
than the direct encoding especially for the more compressed nets {C < 40), while ^^1^2 
provides no advantage. The advantage of fts is clear in the case of raw action control 
(*2), where Qi did not reach the average fitness of 0.75 when up to 80 coefficients were 
used. For "^^2^3, networks represented by just 20 coefficient (compression ratio: 184:1) 
outperform the direct encoding both in terms of learning speed and final fitness. 



controller from each of the 20 runs for a given number of coefficients; the boxes indicate 
the upper/lower quantiles and the bars the min/max values. The solid straight line is 
the median fitness for the direct encoded controllers, the dashed lines correspond to the 
upper/lower quantiles. For C = {160,320} the generalization is comparable to that of 
direct encoding, but with significantly lower variance, and networks encoded with C — 
{40, 80} generalize better that the direct nets, again with lower variance. The performance 
of (7 = 20 yields the best generalization, very consistently performing nearly as well as 
on the original three starting positions. The networks with lower compression {C < 160) 
better capture the general behavior required to reach the goal from new starting positions. 

4.5.2 Different Arm Lengths 

In this test, the arm length is changed from 10 compartments to between 3 and 20. 
Different arm lengths mean different numbers of inputs, and consequently require different 
size weight matrices. For the DCT encoded nets, the size of the network is independent 
of the number of coefficients, so that different arm lengths can be accommodated by 
modifying the size of the coefficient matrix appropriately (see figure 3). However, for the 
direct encoded nets, there is no straightforward way to add or remove structure to the 
network meaningfully. 

In order to be able to compare direct and indirect nets, the direct nets were transformed 
into the frequency domain by reversing the procedure depicted in figure 1. First, the 
network weights are mapped to the appropriate positions in the correct number of multi- 
dimensional arrays. The forward DCT is applied to each array, and the network is then 
"re-generated" to the appropriate size for the specified arm by adjusting the size of the 
coefficient matrix (padding with zeros if the matrix is enlarged), and applying the inverse 
DCT. 

The best network from each run was re-evaluated on each of the arm lengths (3-20). 
The number of time steps allowed to control arms longer than 10 compartments was 
increased linearly up to 500 time steps for 20 compartment arm. The closest position 
of the arm tip and the time step when the goal was reached were used to compute the 
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Figure 10: Weight matrix visualization. Each group of images shows typical evolved 
weight matrices for each ^^fi configuration. Each row consists of an input matrix (left), 
recurrent matrix (center), and bias vector (right). Colors indicate weight value: blue = 
large, positive; orange = large negative. For ^^if^i, high fitness can be achieved with 
very simple matrices in which the weight values change smoothly (are highly correlated), 
compared to the direct approach (bottom). The 4D arrays used by ^^2^3 allow regularities 
inherent in the raw-action control to be captured by as few as 20 coefficients. 

fitness. Arms that moved the arm tips further away from the goal were assigned zero 
fitness because the closest position (which is in fact the initial arm position) was reached 
in zero time. 

The results of this test are summarized in figure 13. The surface plots show the 
difference between the indirect and direct encoding for each compression level and number 
of compartments for (a) ^^if^i (meta-actions), and (b) "^^2^3 (raw actions). The elevation of 
the surface above the z = plane indicates how much better or worse the indirect encoding 
is in generalizing to different arm lengths than the direct encoding (with networks resized 
as described above). While the convergence speed of the indirect and direct encoding was 
very similar for (figure 8), the indirect encodings are less sensitive to changes in the 
network size. The deep trough at 10 compartments in graph (a) is due to the fact that, 
for this length arm (the same as used to evolve the nets), the direct encoding is slightly 
better on average than the indirect encoding (see final fitness in ^^if^i plot in figure 8), 
but cannot generalize well to even small changes to the arm length — the direct encoded 
solutions are overspecialized. 

As with the test in section 4.5.1, the best generalization performance is obtained with 
20 and 40 coefficients for both "^i^i and For larger numbers of coefficients, the 

generalization declines gradually for arm length of around 10, and more rapidly for shorter 
arms. 

5 Discussion and Future Work 

The experimental results revealed that searching in the "compressed" space of Fourier 
coefficients can improve search efficiency over the standard, direct search in weight space. 
The frequency domain representation exploits the correlation between weight values that 



14 




10 20 30 40 50 60 70 80 90 

Coefficients 



Figure 11: Incremental Coefficient Search. The box-plot shows the median, max, 
min, and 25% -75% quantile fitness (20 runs) achieved for a given number of coefficients 
in incremental evolution of ^^2^^3 networks. The median number of coefficients for which 
adding more coefficients does not improve the solution is 30. 

are spatially proximal in the weight matrix, thereby reducing the number of parameters 
required to encode successful networks. Both fixed and incremental search in coefficient 
space discovered solutions that required an order of magnitude fewer parameters than the 
direct encoding for the octopus arm task, and a similar improvement in learning speed. 
Perhaps more importantly, it also produced controllers that were more general with respect 
to initial states, and more robust to changes in the environment (the arm length). This 
supports the idea that band-limited networks are in some sense simpler, and therefore less 
prone to overfitting. 

The choice of encoding scheme, Jl, proved decisive in determining the amount of com- 
pression attainable for the two network architectures. There are many possible ways to 
organize the coefficients as input to the decompressor (iDCT), but the fact that even 
the most naive approach, Qi (where one set of coefficients is used to represent all of the 
weight matrices) worked well, is encouraging. The slightly more complex illustrates how 
adding higher dimensional correlations does not necessarily lead to better compression. 

So, how to choose a good fi? A useful default strategy may be to first identify the high- 
level dimensions of the environment that partition the weights qualitatively (e.g. for input 
weights: the compartment from which its connection originates, the compartment where 
it terminates, the muscle it affects, and which of the eight state variables it is associated 
with), and assume that these dimensions are all correlated by arranging the coefficients in 
data structures with the same number of dimensions, as was done in "if 2^3- This strategy, 
though the most complex, yielded by far the most compression, with solutions having 
thousands of weights being discovered by searching a space of only 20 coefficients. 

It might be possible to achieve even higher compression by switching to a different basis 
altogether, such Gaussian kernels (Glasmachers et al., 2011) or wavelets. One potential 
limitation of a Fourier-type basis is that if the frequency content needs to vary across 
the matrix, then many coefficients will be required to represent it. This is the reason for 
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Figure 12: Generalization: different starting positions. Controllers encoded in- 
directly using from 10 to 320 coefficients (box-plots) are compared to directly encoded 
controllers (horizontal lines). Data points are the median of 20 runs, the boxes indicate 
the lower and upper quartiles, and the bars the minimum and maximum values. 




Figure 13: Generalization: changing arm length. The best network in the final 
population of each evolutionary run is tested on an arm having from 3 to 20 compartments. 
The surface plots show the difference between the indirect and direct encoding for each 
compression level and number of compartments for (a) ^^if^i (meta-actions), and (b) "^2^3 
(raw actions). The surface elevation above the "water" indicates the degree to which the 
indirect encoding generalized better than the direct encoding. 

using multiple chromosomes per genome in our experiments. In contrast, wavelets are 
designed to deal with this spatial locality, and could therefore provide higher compression 
by allowing all network matrices to by represented compactly by a single set of coefficients; 
for example, a simple scheme like ^^2^1 could possibly compress as well as ^^2^3 while 
requiring less domain knowledge. 
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(a) 




(c) 

Figure 14: Octopus arm visualization. Visualization of the behavior of one of the 
successful controllers compressed to 40 genomes. The motion starts from one of the three 
initial states (a,b, and c). The arm base (depicted with a cross) is fixed. In the last phase, 
the goal is plotted with the disc. The controller uses a whip-like motion to overcome the 
environment friction. This sequence of snapshots was captured from the video available 
at http : //www. idsia. ch/'^koutnik/images/oct opus .mp4. 

In the current implementation, the network topology (number of neurons) is simply 
specified by the user. However, given the fact that the size of the weight matrices is 
independent of number of coefficients, it may be possible to optimize the topology by 
decoding genomes into networks whose size is drawn from probability mass function that 
is updated each generation according to relative performance of each topology. Future 
work will begin in this direction to not only search for parsimonious representation of 
large network, but also to determine their complexity. 
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